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Abstract 

Ensemble data assimilation is a problem in determining the most 
likely phase space trajectory of a model of an observed dynamical sys- 
tem as it receives inputs from measurements passing information to 
the model. Using methods developed in statistical physics, we present 
effective actions and equations of motion for the mean orbits associ- 
ated with the temporal development of a dynamical model when it has 
errors, there is uncertainty in its initial state, and it receives informa- 
tion from measurements. If there are correlations among errors in the 
measurements they are naturally included in this approach. 

Assimilating the information in observed data into models of a dynamical 
system when there are errors in the measurements, errors in the models, 
and uncertainty about the precise state of a system when the assimilation 
process begins has stimulated discussions about placing data assimilation in 
a probabilistic setting [1, 2]. The goal is to calculate the probability of a 
model state vector y(t m ) to be at a certain location in state space at time t m 
conditioned on the observation of data at to, ti, t m . A recursive formula 
for this conditional probability has been discussed. We give an information 
theoretic interpretation of this recursion relation and indicate how it can 
be systematically approximated within a framework that is suggested by 
developments in statistical physics. While approximations can be made 
at almost any stage, we show how to systematically represent the effective 
action associated with the procedure. We also indicate how one can estimate 
unknown model parameters. 

This subject is broadly addressed in the sciences. We have examined spe- 
cific research seeking to estimate parameters and states in neurobiology [3], 
systems biology [4], atmospheric and oceanic sciences [5, 6, 7], biomedical 
engineering [8], cell biology [9], chemical engineering [10], toxicology [11], 



1 



coastal and estuarine modeling [12], wastewater treatment [13], biochem- 
istry [14], and immunology [15] as examples. The issue of constructing 
observers in control theory [16] also deals with state estimation from ob- 
served data. The literature has focused on disciplinary details of models, 
metrics for errors of model output, model errors, measurement errors, and 
numerical methods. 

We review the formulation of data assimilation in an ensemble or prob- 
abilistic sense to establish notation and to provide a framework for our 
subsequent discussion. We begin with an observed dynamical system with 
state variables x(i n ) = x(n). Over an observation or assimilation window at 
each of the discrete times t n : {to, t\, t m } L functions of the state variable 
zi (t n ) = hi(x(t n )), ZL(t n ) = /iz,(x(t n )) are reported. This provides us 
with m+1 observed L-dimensional vectors z(t n ) = z(n). From physical ar- 
guments we construct a D-dimensional model of this dynamical system with 
state variables w(£ n ) = w(n) and with a dynamical rule w — ► g(w, p) tak- 
ing the state at time t n to the state at time t n+ \ : w(n + 1) = g(w(n),p). 
P = {Pi,P2, ■■■} are fixed parameters of the model. D is typically much 
greater than L. 

We wish to choose the model, the initial conditions w(0), and the pa- 
rameters p so that at the measurement times the model state is such that 
hi(w(n)) = zi(n); I = 1,2, ...,L. Because D 3> L, we must estimate the un- 
observed state variables as well as any unknown model parameters in order 
to predict forward from the end of the data window t > t m . 

Now change variables from w-space to yi(n) = hi(w(ri)); I = 1,2, ...,L 
and y a (n) = w a (n); a = L + 1, L + 2, ...,D. The dynamics in y space is 
y(n+ 1) = f (y(n), p). Fixed parameters in the functions y\ = hi(w) are now 
included in the dynamical map y — > f(y,p), and any estimation procedure 
may be asked to determine them as well. 

There are errors in the observations. At each time to,ti, ...,t m there is 
a distribution of possible observations from which the measured values are 
drawn. Since there are many realizations of zi(to), there are many possible 
trajectories for the model to track. 

We want to evaluate the conditional probability that at time t m the state 
of the model is y(m) given the specific sequence of observations z(0),z(l), z(m). 
z(n) = [zi(n), Zi(ra)]. The conditional probability is denoted P(y(m)\z(m), z(m— 
1), z(0)). Its dependence on p and y(0) is not shown. Introducing 
Z(m) = {z(m), z(m — 1), z(0)}, write P(y(m)|z(m), z(m — 1), z(0)) = 
P(y(m)|Z(m)). 
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By the definition of conditional probabilities we have the identity 

Pfr(m)|Z(m)) = n»(m)|y(m) Z(m - l))P(y(m)|Z(m - 1)) 

P(z(m)[Z(m — 1)) 

which is a known, useful result [7, 17, 18]. We reproduce it to introduce our 
notation and to provide our starting point. 
It is informative to rewrite this identity as 

P(y(m)|Z(m)) = 

P(y(m), z(m)|Z(m — 1)) 



P(z(m)|Z(m - l))P(y(m)|Z(m - 1)) 
P(y(m)|Z(m-l)). (2) 

The coefficient in curly brackets is the exponential of the conditional mu- 
tual information between the L-dimensional measurement z(m) and the D- 
dimensional model output y(m), conditioned on the previous observations 
Z(m-l) [19]. We call it 

.,w / x / Mr./ , r P(y(m),z(m)|Z(m — 1)) , 

M7(y(m),z(m)|Z(m - 1)) = log[- 



L P(z(m)|Z(m - l))P(y(m)|Z(m - 1)) J ' 

(3) 

This answers the question: how much (in bits using log 2 ) do we learn 
about y(m) from observing z(m), given we have already observed Z(m — 1). 
MJ(y(0),z(0)|Z(-l))] = logt pgfff^) ]- 

One often approximates the conditional mutual information assuming 
that errors in measurements are not correlated at different observation times. 
This makes M7(y(m), z(m)|Z(m — 1)) independent of Z(m — 1), but the 
assumption is not needed for the general discussion. Including such correla- 
tions may be physically important [17]. 

We have assumed that the description of the dynamics by the state 
vector y(m) represents the full set of dynamical variables. Thus the process 
y(m) — > y(m + 1) is Markov: y(m + 1) depends only on y(m). We may 
use the Chapman-Kolmogorov equation [20] to write P(y(m)|Z(m — 1)) = 
/ d D y{m — l)P(y(m)|y(m — l))P(y(m — l)|Z(m — 1)). For deterministic 
dynamics the transition matrix P(y(n+l)|y(n)) = 5 D (y(n+l) — f (y(n), p)). 

Combining these statements, we write a recursion relation for moving 
forward in time P(y(m)|Z(m)) = exp[M7(y(m), z(m)|Z(m — 1))] / d D y(m — 
l)P(y(m)|y(m - l))P(y(m - l)\Z(m - 1)). 

Iterating back to to we have 

P(y(m)|Z(m)) = 
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/m—1 
H d D y(n)exp[TMI(Y,Z{m))} 
n 



n=0 

P(y(n + l)|y(n))P(y(0)), (4) 

with Y = {y(m), y(m-l), y(0)} and TM/(Y, Z(m)) = En=o M/(y(n), z(n))|Z(n- 
1))- 

TMI is the sum of the conditional mutual information associated with 
the observations z(n) and the model states y(n), conditioned on the obser- 
vations previous to that time location. It represents the total information 
passed to the sequence of states Y by the measurements at Z(m). 

This is our basic formula for assimilating data measured at {to, t±, t m } 
into guiding a model trajectory y(n) and providing that model with infor- 
mation on any unknown parameters in the model and on the values of the 
unobserved state variables of the model. 

Introduce the cumulant generating function 

exp[C m (K)] = (5) 

m m 

IJ d D y(n) exp[]T k(n) • y(n)] exp[-A (Y, Z(m))], 



/ 



n=0 n=0 



where 



-Ao(Y, Z(m)) = TMI(Y, Z(m)) (6) 

m—1 

+ 53 log[P(y(n + l)|y(n))] + log[P(y(0))], 

ra=0 

and this is an 'action' for motion along the orbit Y. We have associated 
a k(n) with each location in time where a measurement is made. The col- 
lection of these is denoted by K = {k(m), k(m — 1), k(0)}. In statistical 
physics and quantum theory these are currents that produce excitations 
from the ground state. Here they are useful tools for analyzing the system 
at hand. 

The conditional mean orbit is found as < y(n) > = ^jw^ ^ Ik=0' anc ^ ^ e 
moments about this mean are determined by higher derivatives of C m (K) 
at K = 0.. 

It is very useful to define the K dependent trajectory <p(n) via <j>(n) = 

^r4~r^ and with this to make the transformation of variables from the 
9K(n) 

multipliers K to $ = {<f)(m),<j>(m — 1), 0(0)} via the definition of the 
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effective action [21] A(<fr,Z(m)) 

m 

A(*,Z(m)) = -C ro (K) + £k(n)-0(n). (7) 

n=0 

It is familiar that M( *ffi) m)) = fc a (n) and the inverse of is 
Z(m\) h , - WW) 

7(V, f\in))an,bn' ~ d<f> a (n)d<f> b (ri) ■ 

Significant benefit comes from using the form of -A(<1>) in the formula for 
C m (K) to find 

exp[-A(*,Z(m))]= (8) 

m m 

II d D 2/(n) exp[^ k(n) • (y(n) - 0(n))] 

n=0 n=0 

exp[-A)(Y,Z(m))], 

Considering the orbit <fi(n) as a kind of base trajectory, change integration 
variables to if)(n) = y(n) — <f>(ri), denote * = {i/)(m),i/)(m — 1), i/>(0)}, 
and set A 1 (&, Z(m)) = A(<&, Z(m)) - A (*, Z(m)) leading to 

exp[-A 1 (*,Z(m))]= / n^(n)exp[f;^|^Ml-^(n)] 

J n=0 n=0 <•><?> W 

exp[-{A (* + *, Z(m)) - A (*, Z(m)) 

If the * are small corrections to the base trajectory we can expand 
the exponent in the integrand in powers of the ij){ri), and to second order 
we have 

exp[-Ai(*,Z(m))] = 

. m j 

/ Y[d D ip(n)exp[-- 1pa(n)lo(®, Z ( m ))an,bn'M n ') 
n=0 an,bn' 

where 7 (*, Z(m)) a „ j6ri / = f^^^ ■ Performing the Gaussian integral 
leads to the differential equation 

2A 1 (*,Z(m)) = A - To" 1 ^^^)) U ^ 
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with A = log 



det7o(^>,Z(m)) 

( 27r )(m + l)D 



We look for a solution to this differential equation as a power series in <1>. 
The scalars Ai($>) and A(<&) (suppressing the dependence on Z(m)) have 

the expansions Ai(<&) = ao + ai* + *a2$/2 H and A(<fr) = Ao + Ai$ + 

<l>A2<I?/2 + • • • to 2 nd order in <I>. ao, Ao are scalars, a±, A\ are vectors, and 
a2,A2 are matrices. Each of these quantities depends on other aspects of 
the specific Ao(3>, Z(m)), including Z(m). 

Comparing the terms in the differential equation results in 2ao = Ao — 
ai7 _1 ai, 2ai(X + 7 7 1 a2) = Ai, and lai^Z + 7 ~ 1 O2) = A 2 . 2T is the unit 
matrix. One may check that if Ai = and A2 = 0, meaning that ^4o(^) is 
quadratic, then consistently a\ = ai = 0, and all probabilities are Gaussian. 
Higher order terms are determined by continuing the power series expansion. 

More accurate approximations to Equation (9) permit refinement of the 
differential equation for Ai(&, Z(m)). 

Solving for Ai(&,Z(m)) allows us to address important questions such 
as the mean orbit during assimilation and the covariances about this mean. 
The mean trajectory through model phase space, call it < y a (n) > = v a (n), 
is determined by 

&4(*,Z(m)) 

This stationarity of the effective action replaces the standard statement 
called 4D-VAR in the geosciences literature for determining the optimal orbit 
of the model system [7]. 4DVAR uses Ao(<&, Z(m)) alone. The covariance 
about v(n) is given by the inverse of the matrix 7(V,Z(m)) where V = 
{v(m),v(m — 1), v(0)}. Higher moments are found through derivatives 
of A(&, Z(m)) evaluated along the orbit $ = V. 

The mean phase space point reached at the end of the data assimilation 
window is v(m) and the D x D covariance about this mean location is 
7~ (V,Z(m)) am,bm- These, and possibly higher moments about v(m), are of 
physical interest, and any question of whether P(y(m)|Z(m)) is Gaussian [7], 
unlikely for any Aq(Y ,Z(m)) of interest, is not under consideration. 

If there are no model errors, then P(y(n + l)|y(n)) = 5 D (y(n + 1) - 
f(y( n ))P))) an d we can carry out all integrals in the expression for C m (K) 
except that over y(0). This gives us 

/m 
A(0)expEk(n).y(n)] 

n=0 

exp[-< s (Y,Z(m))], (12) 
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where Aq E (Y, Z(m)) is the same as Aq(Y, Z(m)) without the transition ma- 
trices and with y(n) = fW(y(0),p) and Y = {f (m) (y(0), P ), f^^O), P ), ...,y(0)}. 
This says that the trajectory starting from any of the points y(0) is carried 
to y(l) = f(y(0),p), and to y(m) = f(y(m — l),p). All statistical 
information arises from P(y(0)). 

This is essentially the situation defined by [22] for differential equations 
and without assimilation of data. For discrete time dynamics, we have 

< y a {n+l) >= 

< /a(y(n),p) >= {/a(^y,p)exp[^ E (K)]}| K=0 , (13) 

and this translates into an effective equation of motion for v(n): v(n + 1) = 
f £ (v(n),p), where i E (<j>, p) is defined by a power series of derivatives on 
functions of 0, then we use $ = V. For example, if f(y,p) is quadratic in 
the state variables y, so (a,b,c = 1, ...,£>) / a (y,p) = 4i(p) + B ab (p)y b + 
C ab c(p)ybyc then /f (v(n),p) = A a (p) + B ab (p)v b {n) + C abc {p)[v b {n)v c (n) + 
7~ 1 (V, Z(m)) bniCn ]. A quadratic nonlinearity is likely to be interesting for 
many statistical fluid dynamics problems. 

When there are model errors, we may incorporate them, to the extent 
they lead to reduced phase space resolution in the dynamics, by approxi- 
mating the delta function transition probabilities as distributed Gaussians. 

A summary of the path traversed here is that we followed routes es- 
tablished in statistical physics by transforming from a cumulant generating 
function for the mean orbit < y(n) > and covariances about that mean, to 
an effective action which lends itself to relative ease in its evaluation. We 
coupled these methods to the problem of estimating all state variables of 
a model in a setting where we have information from noisy measurements 
of a sparse subset of these, uncertainty about the model parameters, and 
uncertainty of the initial state when the measurements begin. 

For times later than the observation window {to, ...,t m }, all properties 
of the distribution of model variables P(y(tM > t m )); M > m are deter- 
mined by Equation (9) with TMI = for all times greater than t m where 
there are no new observations. We never have to explicitly evaluate the 
conditional probability distribution as the effective action provides us with 
the measurable expected values and covariances. We can add information 
from observations after t m through the addition of a nonzero conditional 
mutual information term whenever a measurement is provided using the 
same general formulation. 

In the case of continuous labels in space and time for our dynamical vari- 
ables the equation for Ai(&, Z(m)) would be functional. Indeed, attending 
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to the fact that measurements are discrete in time and that numerical eval- 
uations of model state variables are also discrete in time, we have avoided 
many difficult mathematical questions. These have been addressed by Eyink 
and collaborators [23] . The transition from our formulation of dynamics as 
a set of ordinary differential equations, discretized in time to iterated maps, 
to partial differential equations conveying information both in continuous 
space labels and in continuous time is not difficult in a formal sense. In- 
deed, statistical field theories often start from that perspective and simplify 
to our viewpoint. As measurements are actually performed in discretized 
time and coarse grained space, there may be advantage to beginning as we 
do. 

The statistical physics framework we pose does suggest another route 
for exploration of its properties. One expects that as the sampling time and 
the spatial sampling lengths used in our discrete map dynamical equations 
become small, there will be scales of resolution below which there is not any 
improvement in the underlying physics. This suggests that a renormalization 
group [21] formulation to express this independence of small scale phenom- 
ena may yield valuable approximations to the methods explored here. 

Until this point, we have focused on the estimation of state variables 
by using probes K of the location of a model orbit as it incorporates infor- 
mation from measurements. One may estimate the fixed parameters in the 
model as well by asking that the effective action A(<&) which is a function 

of those parameters also satisfy ^^'^ > ' > =0. A formal argument can be 
given treating the fixed parameters as time dependent vectors p(n) satis- 
fying p(n + 1) = p(n), but it illuminates little here. A more informative 
statement is that requiring the effective action to be stationary in the val- 
ues of the fixed parameters is equivalent to asking that we maximize the 
total mutual information transferred from the measurements to the state of 
a model subject to imposition of the equations of motion y — > f(y,p). 

In our introduction we indicated that these methods may find use in 
the analysis of problems in a large number of different fields. The detailed 
association of equations of motion, representation of quantity and quality 
and statistics of measurements and their errors, and uncertainties in preci- 
sion of initial states will vary with the specifics of the scientific arena being 
explored. 

Finally, the search over states and parameters to realize aj4 ^'P' ) = 

and \v a (n) = 0) when the dynamics leads to chaotic motions, requires 

regularization to permit numerical methods to be accurate [24, 25] and to 
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allow estimation of the unobserved variables. 
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